Skip to content

gh-149590: Remove faulthandler_traverse#150023

Merged
JelleZijlstra merged 3 commits into
python:mainfrom
armaan-v924:armaan/issue-149590-multi-instance-faulthandler
May 18, 2026
Merged

gh-149590: Remove faulthandler_traverse#150023
JelleZijlstra merged 3 commits into
python:mainfrom
armaan-v924:armaan/issue-149590-multi-instance-faulthandler

Conversation

@armaan-v924
Copy link
Copy Markdown
Contributor

@armaan-v924 armaan-v924 commented May 18, 2026

tldr; faulthandler_traverse visits Python objects owned by _PyRuntime, not by the module instance. With multi-phase init allowing multiple module instances, each instance's GC traversal decrements gc_refs on the same runtime-owned objects, driving it negative when two instances are collected simultaneously. Cleanup is already handled well, traversal is redundant, and harmless with a single instantiation.

Expanded Explanation:

Author's note: This is my first foray into the CPython codebase. Would appreciate all the scrutiny :D

RCA:

BG: faulthandler uses multi-phase init, so deleting it from sys.modules and re-importing produces a second, independent module object. Both module objects are tracked by the garbage collector and both have md_state_traverse = faulthandler_traverse stored.

  1. subtract_refs iterates every object in the GC generation list.
  2. For each object it calls Py_TYPE(op)->tp_traverse — for module objects that's module_traverse in Objects/moduleobject.c
  3. module_traverse calls md_state_traverse (i.e. faulthandler_traverse) then visits md_dict
  4. faulthandler_traverse calls Py_VISIT(fatal_error.file)(expands to visit_decref(_PyRuntime.faulthandler.fatal_error.file))
  5. With two module instances this runs twice; gc_refs on fatal_error.file (stderr, by default) is decremented twice
  6. But fatal_error.file's real refcount only has one reference from _PyRuntime; gc_refs goes negative → debug assertion fires, silent UAF in release

Safety

m_traverse exists to tell the GC about Python objects owned by per-module C state (md_state, allocated when m_size > 0). faulthandler has m_size = 0, so there is no per-module C state. fatal_error.file, thread.file, and user_signals[signum].file are all owned by _PyRuntime, not by any module instance. _PyRuntime is not a Python object and is never in containers, so its references are never passed to visit_decref, they contribute to ob_refcnt but are never subtracted, meaning gc_refs >= 1 after subtract_refs. GC will never mark these objects as unreachable regardless of whether the module traverse visits them. Cleanup is already handled correctly: faulthandler_disable() calls Py_CLEAR(fatal_error.file) (and equivalents), and _PyFaulthandler_Fini() calls faulthandler_disable() at shutdown. No GC involvement is needed.

Functionality

From main:
image

After removal:
image

`faulthandler_traverse` visits Python objects owned by `_PyRuntime`, not
by the module instance. With multi-phase init allowing multiple module
instances, each instance's GC traversal decrements `gc_refs` on the same
runtime-owned objects, driving it negative when two instances are
collected simultaneously.
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 18, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot
Copy link
Copy Markdown

python-cla-bot Bot commented May 18, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

@armaan-v924 armaan-v924 marked this pull request as ready for review May 18, 2026 20:48
Comment thread Misc/NEWS.d/next/Core_and_Builtins/2026-05-18-13-47-17.gh-issue-149590.IPBeQx.rst Outdated
@JelleZijlstra JelleZijlstra added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes needs backport to 3.15 pre-release feature fixes, bugs and security fixes labels May 18, 2026
@JelleZijlstra JelleZijlstra merged commit 5673748 into python:main May 18, 2026
59 checks passed
@miss-islington-app
Copy link
Copy Markdown

Thanks @armaan-v924 for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14, 3.15.
🐍🍒⛏🤖

@miss-islington-app
Copy link
Copy Markdown

Sorry, @armaan-v924 and @JelleZijlstra, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 56737483c2ffdaadfec648fd38d409c6b10941c0 3.14

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 18, 2026

GH-150037 is a backport of this pull request to the 3.15 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label May 18, 2026
@miss-islington-app
Copy link
Copy Markdown

Sorry, @armaan-v924 and @JelleZijlstra, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 56737483c2ffdaadfec648fd38d409c6b10941c0 3.13

@JelleZijlstra
Copy link
Copy Markdown
Member

Thanks @armaan-v924! Are you able to create the backport PRs for 3.13 and 3.14? It should be possible with cherry-picker plus some small merge conflict fixes.

JelleZijlstra pushed a commit that referenced this pull request May 18, 2026
gh-149590: Remove faulthandler_traverse (GH-150023)

`faulthandler_traverse` visits Python objects owned by `_PyRuntime`, not
by the module instance. With multi-phase init allowing multiple module
instances, each instance's GC traversal decrements `gc_refs` on the same
runtime-owned objects, driving it negative when two instances are
collected simultaneously.
(cherry picked from commit 5673748)

Co-authored-by: Armaan Vakharia <43391096+armaan-v924@users.noreply.github.com>
@armaan-v924
Copy link
Copy Markdown
Contributor Author

Thanks @armaan-v924! Are you able to create the backport PRs for 3.13 and 3.14? It should be possible with cherry-picker plus some small merge conflict fixes.

Yea, for sure. I'll take care of it tomorrow.

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 19, 2026

GH-150087 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.13 bugs and security fixes label May 19, 2026
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 19, 2026

GH-150088 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.14 bugs and security fixes label May 19, 2026
JelleZijlstra pushed a commit that referenced this pull request May 19, 2026
`faulthandler_traverse` visits Python objects owned by `_PyRuntime`, not
by the module instance. With multi-phase init allowing multiple module
instances, each instance's GC traversal decrements `gc_refs` on the same
runtime-owned objects, driving it negative when two instances are
collected simultaneously.
(cherry picked from commit 5673748)
JelleZijlstra pushed a commit that referenced this pull request May 19, 2026
`faulthandler_traverse` visits Python objects owned by `_PyRuntime`, not
by the module instance. With multi-phase init allowing multiple module
instances, each instance's GC traversal decrements `gc_refs` on the same
runtime-owned objects, driving it negative when two instances are
collected simultaneously.
(cherry picked from commit 5673748)
@armaan-v924 armaan-v924 deleted the armaan/issue-149590-multi-instance-faulthandler branch May 19, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants